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Ultrareliable fault-tolerant onboard digital systems for spacecraft intended for 
long mission life exploration of the outer planets are under development. The design 
of systems involving self-repair and fault-tolerance leads to the companion problem 
of quantifying and evaluating the survival probability of the system for the mission 
under consideration and the constraints imposed upon the system. Methods have 
been developed to (1) model self-repair and fault-tolerant organizations; (2) compute 
survival probability, mean life, and many other reliability predictive functions with 
respect to various systems and mission parameters; (3) perform sensitivity analysis of 
the system with respect to mission parameters; and (4) quantitatively compare 
competitive fault-tolerant systems— various measures of comparison are offered. To 
automate the procedures of reliability mathematical modeling and evaluation, the 
CARE (computer-aided reliability estimation) program was developed. CARE is an 
interactive program residing on the UNIVAC 1108 system, which makes the above 
calculations and facilitates report preparation by providing output in tabular form 
and graphical 2-dimensional plots and 3-dimensional projections. The reliability 
estimation of fault-tolerant organization by means of the CARE program is 
described in this article. 


Introduction 

The task of evaluating system performance of digital system architectures 
designed for long life or ultrareliability is a recurring one. The state-of-the- 
art of fault-tolerant computing makes available to the designer various 
models or schemes which by judiciously using protective redundancy impart 
greater system probability of survival than would be possible by the use of 
simplex technology alone. One or more of these fault-tolerant schemes— 
triple modular redundancy (TMR), iV-tuple modular redundancy, TMR/ 
simplex redundancy, component redundancy, standby replacement, K-out- 
of-A 1 ’ systems, hybrid redundancy, and hybrid/simplex redundancy (Refer- 
ences 1-5)— in combination make the architecture of fault-tolerant organiza- 
tions. The overall reliability model takes into consideration the effect of 
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variation in individual parameters of the basic schemes on the overall system 
reliability goals— this relationship being expressed as a mathematical 
function which is often referred to as the reliability mathematical model of 
the system. The reliability evaluation task, once the system reliability 
mathematical model is known, may be: (1) to evaluate the system reliability 
given the values of the model parameters, or (2) to optimize the reliability 
objective by selecting optimum values of the model parameters. Since the 
number of combinations of the basic schemes and the range of possible 
values that the system parameters may undertake are very large, the 
decision was taken to automate the reliability evaluation procedure which 
resulted in the development of a conversational computer program called 
CARE (computer-aided reliability estimation). 


Functional Description of CARE 

CARE’s purpose is to serve as a computer-aided reliability design tool to 
designers of ultrareliable fault-tolerant systems by facilitating reliability 
computation, data generation, and comparative evaluation. CARE consists 
of 4150 Fortran V statements designed to be run on the UNIVAC 1108 
under EXEC 8, version 11C (References 6 and 7). The results of the program 
are available in three forms: (1) as printouts, (2) as graphical 2-dimensional 
plots, and (3) as graphical 3-dimensional projections. 

CARE has three modes of operation: (1) “conversational” or interactive 
mode, (2) batch mode, and (3) remote-started batch mode. In the “conversa- 
tional” mode, CARE may be interactively accessed by users from remote 
teletypes or other communication consoles to perform reliability analysis in 
“real time.” In the batch mode the job is submitted off-line and necessarily 
no dynamic changes to the user requirements can be made; this mode is 
expeditious when the user knows his needs exactly and hence need not spend 
time sitting at a console to input his queries. The remote-started batch mode 
is similar to the batch mode except that instead of submitting the job as a 
deck of punched cards the deck entry may be made via a console. 

Essentially, CARE consists of a repository of mathematical equations 
defining the various basic redundancy schemes. These equations may then, 
under program control, be interrelated to generate the desired mathemati- 
cal model to fit the architecture of the system under evaluation. The 
mathematical model may then be supplied with ground instances of its 
variables and then evaluated to yield values for the specified independent 
variable or the mathematical model may be further manipulated so as to 
yield other reliability theoretic results. 


CARE’s Repository of Equations 

The equations residing in CARE model the following basic fault-tolerant 
organizations: 
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(1) Hybrid-redundant (N,S) systems (see Ref erences 1 and 2). 

(a) NMR (N, 0) systems (see References 3 and 8). 

(b) TMR (3,0) systems (see Reference 3). 

(c) Cascaded or partitioned versions of the above systems. 

(d) Series string of the above systems. 

(2) Standby-sparing redundant (1,S) systems (see References 3 and 4). 

(a) K-out-of-iV systems (see Reference 4). 

(b) Simplex systems. 

(c) Series string and cascaded versions of the above. 

(3) TMR systems with probabilistic compensating failures (see Refer- 
ence 3). 

(a) Series string and cascaded versions of the above. 

(4) Hybrid/ simplex redundant (3,S) sim systems (see References 5 and 9). 

(a) TMR/simplex systems (see Reference 4). 

(b) Series string and cascaded versions of the above. 

For the description of the above systems and their mathematical 
derivations, refer to the cited references. These equations are the most 
general representation of their systems parameterizing mission time, failure 
rates, dormancy factors, coverage, number of spares, number of multiplexed 
units, number of cascaded units, and number of identical systems in series. 
The definition of these parameters resides in CARE and may be optionally 
requested by the user (see Figure 1). More complex systems may be modeled 
by taking any of the above listed systems in series reliability with one 
another. 

These reliability equations may be evaluated as a function of absolute 
mission time, normalized mission time, nonredundant system reliability, or 
any other system parameter that may be applicable. Among the various 
measures of reliability that the user may request for computation are: the 
system mean-life, the reliability at the mean-life, gain in reliability over a 
simplex system or some other competitive system, the reliability improve- 
ment factor, and the mission time availability for some minimum tolerable 
mission reliability. 


Formulation of a Typical Problem for CARE 

A typical problem submitted for CARE analysis may be the following: 
Given a simplex system with 8 equal modules which is made fault-tolerant 
by providing two standby spares for each module, where each module has a 
constant failure rate of 0.5 failures per year and where the spares have an 
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SiXUT ATMAN. CAR£ ; 

HELLO TERMINAL - I AM YOUR RELIABILITY ANALYST WITH THE 

CARE (COMPUTER-AIDED RELIABILITY ESTIMATION) PACKA6E 

DO YOU WISH TO HAVE YOUR ANSWERS TO THE QUESTIONS PRINTED BACK. 

AimMT ER YES OR NO 

YES 

00 YOU WISH TO KNOW..THE DEFENITIOnS OF . reliability parameters and terms, 
answer yes or no 

"the defen it ions of’ the var ious reliability parameters 
and terms are as follows. 

_T_= mission jime. _ 

K = SYSTEM RELIABILITY. 

_S _=...THE ..TOTAL. NUMBER. .OF. SPARES. 

N = THE NUMBER OF MULTIPLEXED UNITS. 
^^..TNVELPSEJ).0RMANCy_F_ACI0.P_F_l.LAM3DA/MUJ_^. 

C = COVERAGE FACTOR, 

^.CONDITIONAL .PROBABILITY OF. system. RECOVERING GIVEN a failure OCCURANCE. 
q = quota, number of identical units in a simplex system. 

JL-S-NUMBEBLOF .CASCADED .UNITS. . - 

I z NUMBER OF IDENTICAL SYSTEMS IN SERIES. 

_E._S_EMQaABIUJY-.OF . A_DMII. JAlLlNG.JLO.. A-.LOGIC ZERO. 

kvs reliability of the restoring organ. 

MU=- UNPOWERED FAILURE RATE .OF. A . SIMPLEX SYSTEM = K/LAMBDA. 

LAMBDA = POWERED FAILURE RATE OF A SIMPLEX SYSTEM = K*MU. 


. LAMJ....--NQRMAUSED. .TIME-?. J-AMBDAtm I SSXON.T I ME . 

EL AMT S EXP(-LAMT). 

REL = SYSTEM RELIABILITY. 

UnREL = SYSTEM UNRELIABILITY ; (1 - REL). 

.SIMREl -SIMPLEX RELIABILITY.:. ELAMT. ... 

SlMGAIN = GAIN IN RELIABILITY WITH REFERENCE TO A SIMPLEX SYSTEM . 

“sTMRIF = RELIABILITY IMPROVEMENT FACTOR WITH REFERENCE TO A SIMPLEX SYSTEM. 
. . . =. (1 - SIMREU/U - REL). 

DO Y°U_NEED INSTRUCTIONS FOR _RUNNJNG _IHE CARE PROGRAM 
ANSWER YES OR NO 

SHORtcoMMENf“~fHE CARE PROGRAM COMPUTES, ’ W I TH RESPECT TO THE 
SELECTED EQUATIONS AND PARAMETERS THE FOLLOWING RELIABILITY 
FUNCTIONS - THE RELIABILITY (REL), UNRELIABILITY (UNREL), 

SIMPLEX RELIABILITY (SIMREL), SIMPLE GAIN (SlMGAIN), SIMPLE 

reliability improvement factor isimrifi, mean time to failure 

,(MTF ) j_ REL I AB I U.IJLT A_T_IHE MTF, RELIABILITY DIFFERENCE (DIFF), 

reliability gain ("gain), reliability improvement factor irif), 

SIMPLE. MAXIMUM MISSION_TIME_(SIMTMAX) , MAXIMUM MISSION time UMAX), 

simple time improvement factor (simtif), and the ratio of 
time improvement factors (Ratifi, 

PU and some dp plots can be obtained for thf above computations. 
various plotting options to specify the abscissa, the range 
OF ABSCISSA and ORDINATE VALUES are available. ABILITY to PLOT 3D 
INTERSECTIONS of 30 PROJECTIONS With ZD PLANES IS ALSO available. 


Figure 1. A sample of CARE's question/answer capability 
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JWE_CA«£JPRfiGRAii_Al-SO_E VALUA Tes ^i?MPLEX_fiELI ABILI Tt . FUNCTIONS 
FORMED by taking products OF The basic reliability equations. 



THESE ARE tabulated BELOW. 

N . S> = F IT, lambda, MIUSjJI «.K ,.R.Va2,_WI 

this is the general reliability equation of an hybrid- 
_ redundant system, 

2. RIO. S) = F ( r , L amboa * MU,S*K ,Q, C ,2, W ) 

THIS is the general reliability equation of a STA NDBY? 

replacement system. 

_3 t .yoio _ 

4. VOID 

-_Sj_SX3jl0J_-=_£lIxLAMB0A ,£Y.l2iYLijRI 

this is the equation for a tmr system where the probability 

OF A UNIT FAILING TO LOGICAL ONE OR ZFRO IS PARAMETFRISFD. 

6. R(l.O) = (EXP(-LAMBOA»T))**(Z/W) 

THJS.JS_A._G£n£RAL .EGUAT.1QN_F0R.. A -SIMPLEX SYSTEM. 

7. dummy 

.-IHIS _IS..A_.DUMMY..E.QUAZlQN_WmCH_.IS_ALL -SEX-UP. TO RECEIVE A NEW EQUATION. 

8. BLANK 

2*... blank ...... 

10. blank 

instructions will be given for entering input data 

_AT.. THE ..TIME.. THE- INP.UT_0ATA.IS_ NEEOED. BY . THE PROGRAM. 


DO YOU WISH TO FORM A PRODUCT OF RELIABILITIES 
. ANSWER .YES..QB..NQ . 

NO 

TYPE In COL UMN 1 TH E NUMBER OF T HF RELIABILITY. EQUATION 

TO BE USED - 1 THRU 7 


input variables for equation i 

T. LAMT. OR ELAMT MUST BE SPEC I FIED AND IT S VAL UE 

IS THE MAXIMUM VALUE FOR THAT VARIABLE. MIN IS THE MINIMUM 

AND STEP IS THE INCREMENT FOR T, LAMT. OR FLAMT. 

SOME VARIABLES THAT ARE NEEDED BY THE EQUATIONS ARE SET 
EQUAL T O A D EFAULT VALUE IF THFY ARE NOT I.NPUTED_.__ THESE. _ 
VARIABLES AND THEIR DEFAULT VALUES ARE; S='l, N=l, 2=1, W=1 

Q=1.0D Q, C- .99 9... D0, P=1.0D ?. MlN =O.Onn, 

STEP=1.0DO, AND ELAMTzl.ODO. 

if b is iNPuTEO. then this value is usrn as the f irst 

GUESS FOR THE UPPER LIMIT OF INTEGRATION IN THE CALCULATION 

OF Ml£ c __ 

IF 0PTI0N=1, THEN 0 IFF , RIF, AND GAIN ARE CALCULATED FOR 
A LL PO SSIBLE COMBINATIONS OF THE PARAMETER. IF_ 0PTIQN=2, . . 
then OIFF, RIF, AND GAIN ARE CALCULATED FOR THE LAST TWO 
PARAMETER VAlUFS. IF OPTIONrfl OR IS NOT INPII I ED . THEN THF 
PROGRAM WILL ASK THE USER AS TO WHICH PARAMETER VALUES 
D1FF, |<IF, AND GAIN-ARE T0_8E CALCULATED. 

NOTE: OIFF, RIF, AND GAIN ARE NOT COMPUTED IF THE USER IS 
CALCULATING THE PRODUCT OF RELIABILITIES OR PLOU I NG_3-_D . .. 


Figure 1 (contd) 
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VAR AS THE NAMELIST NAME. A SAMPLE INPUT FOR EQUATION 5 FOLLOWS: 
*VAR 

T=12.OD0, 

KVrl.ODO, 

Z=Vl 

W=1 » 6 » 

_0PII0N=2 

b=10 , 0D0 

>END __ 

note: namelist input ignores column i 
.the input variables are typed as follows 

double precision: t, lamt, elamt. mut , lambda, mu, 

K, R.V.,..Q,. C, MINj .STEP.,._AND_B 

integer: S, n» W, 2, AND OPTION 

INPUT VAR IA8LES NOJv 

DO YOU WISH TO MAKE ALTERATIONS TO THE iVAR LIST 

.. ANSWER. _T£S_OR.AlO 

NO 

00. YOU WISH TO HAVE. 2-0 RELIABILITY PLOTS - ANSWER YES. OR NO 
YES 

INPUT, A . 1 IN THE_ QOLUHN—SP£C_I£JEO . Ei.EL.QW ... I F.. YOU. W I SH 
the corresponding plot option, otherwise input o. 
note: „hen performing product of reliabilities, no other 
plot option besides product of reliabilities may be specified, 
.column. 1 - PLOTS PRODUCT of reliabilities 
COLUMN 2 - plots reliability 
_CuLUMll_ 3 .-_P.LQ.TS ..DIFFi _RIF,..ANQ. GAIN 

column 4 - plots mtf and reliability at mtf 
column 5 - PLOTS UNRELIABILITY 
01100 

F.OR ABSCISSA, INPUT. l.IN COLUMN. 1 IF. ABSCISSA IS T,. 

1 IN COLUMN 2 if ABSCISSA IS LOGIT) - BASE 10, 

._L4li COUIMftL J...1F-.ABSCJSSA-1S-1-AM1, 

1 IN COLUMN 4 IF ABSCISSA IS LOG (LAMT ) - BASE 10, 

1 IN column 5 IF ABSCISSA IS EXP ( -LAMBDA *T) , 

1 IN COLUMN 6 IF ABSCISSA IS LOGIEXPI-LAMT) ) - BASE 10. 

»*1**» . ... 

IF YOU WISH TO PLOT A CERTAIN RANGE OF X-AXIS VALUES 
.FOR. THE 2-0, Plots ,._ENI ER left-end point IN COLUMNS 1-8 WITH 
FORMAT F8.0 AND RIGHT-END POINT IN COLUMNS 9-16 WITH FORMAT F8.0) 

OTHERWISE INPUT no 

NO 

IF YOU WISH To PLOT A CERTAIN RANGE OF Y-AXIS VALUES 

for the 2-o plots, enter left-end point in columns i-s with 

FORMAT F8.0 AIR) RIGHT-END POINT IN COLUMNS 9-16 WITH FORMAT F8.0! 

otherwise input no 

NO .. ... . 

DO YOU WISH TO PLOT THE LOCUS OF RV SUCH THAT THE 
SYSTEM RELIABILITY EQUALS THE UNIT RELIABILITY. 

ANSWER YES OR NO 

NO ; 

DO YOU WISH TO HAVE 3-D RELIABILITY PLOTS - ANSWER YES OR NO 
NO 

DO YOU WISH TO CALCULATE MAXIMUM MISSION TIME AND SIMPLE TIME 
F.0R .G.1vEN.RELIaBILIT.Y. - ANSWCR .YES or NO 


Figure 1 (contd) 
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do you want plots for these calculations - answer yes or no 

Yts 

DO YOU WISH TO CALCULATE MAXIMUM MISSION TIME FOR 

ANSWER YES OR NO 

XtS 

input in column i one of the following three options: 

1, MAXIMUM MISSION TIME IS COMPARED AGAINST ALL POSSIBLE 

combinations of the' parameter . 

PARAMETER VALUES i 

3. the program _asks .the . user as to which parameter values 
maximum mission time is to be compared. 

DO YOU WANT PLOTS FOR THESE CALCULATIONS - ANSWER YES OR NO 
only THE FIRST 15 PARAMETER COMPARISONS 

input the following r variables each with format fb.c 
.Columns. . 1 - 0 .. ..-..reference .reliability . R 2 . . 

COLUMNS 9-16 - MINIMUM RELIABILITY R1 

COLUMNS 25-32 - RELIABILITY R1 STEP SI2E 

1.U0Q .000 .1.000 .100 _ 

DO YOU WISH TO HAVE PRINTED TABLE OF RELIABILITY RESULTS 

.answer .yes. .or m . 

yls 

AND GAIN RESULTS - ANSWER YES OR NO 

~DO you' WISH 'MTF^F/ d RK’lABTLl fY AT MTF RESULTS PRINTED 

_Awsn6sutESjaa.N0 — 

YES 

DO YOU WANT PRINTED RESULTS OF THE MA XIMUM MISSION __ 

time calculations - answer yes or no 

..Its 

TYPE IN THE VARIABLE THAT IS TO BE USED 


CALCULATIONS FOR EQUATION 1A INI MEANS NOT INPUTEO) 

parameter is k 


lambda 

ni 


MU 

.0900000 


. 1000000+01 


RV 

0 000 00+0 


W P MUT 

__1 .1000000+01 -. NI. 


LAMT REL unrel SIMREL 

.... jlQ.0.Q 1.0005900 .P5QQ009 _ 1,0000000 

.100 .9967989 .0032011 .9048374 

.200 .979 4141 .0205859 .8167307- 


SIMGAIN SIMRIF 

•lOOOOCO+Ol .1000000+36 
.1101633+01 .2972798+02 

.1196259+01. .8805495+01 
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inverse dormancy factor of 10 and the applicable coverage factor being 0.99, 
it is required to evaluate the system survival probability in steps of 1/10 of a 
year for a maximum mission duration of 12 years. It is required that the 
system reliability be compared against the simplex or nonredundant system 
and that all these results be tabulated and also plotted. It is further required 
that the mean-life of the system as well as the reliability at the mean-life be 
computed. It is of interest to know the maximum mission duration that is 
possible while sustaining some fixed system reliability objective and to 
display the sensitivity of this mission duration with respect to variations in 
the tolerable mission reliability. 

It is also required that the above analysis be carried out for the case where 
three standby spares are provided and these configurations of three and two 
spares be compared and the various comparative measures of reliability be 
evaluated and displayed. 

The above problem formulation is entered into CARE by stating that 
Equation 2 (which models standby spare systems) is required and the 
pertinent data (S = 2, 3; Z = 8; K = 10; T = 12.0; LAMBDA = 0.5; C = 
0.99; STEP= 0.1; option = 2) is inserted into CARE between the delimiters 
$VAR. , .$END using the VAR namelist. 

The above example illustrates the complexity of problems that may be 
posed to CARE, and the simplicity with which the specifications are 
entered- The reliability theoretic functions to be performed on the above 
specified system are acknowledged interactively by responding a YES or NO 
on the demand terminal to CARE’s questions at the time it so requests. A 


1.00 
0.80 

S °- 60 

I 

< 0.40 

—I 

a 

0.20 
0.00 

0.01 0.20 0.40 0.60 0.80 1.00 

R(S IMPLEX) = EXP (-LAMBDA x T) 



Figure 2. A sample plot by CARE 
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partial sample run illustrating the question /answer segment of CARE is 
shown in Figure 1. A sample reliability plot generated by CARE for the 
hybrid (3,S) system for S = 6, 4, 2, and 1 at K = 1 is given in Figure 2. 
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